Quasi-Periodic Parallel WaveGAN: A Non-Autoregressive Raw Waveform Generative Model With Pitch-Dependent Dilated Convolution Neural Network
نویسندگان
چکیده
In this paper, we propose a quasi-periodic parallel WaveGAN (QPPWG) waveform generative model, which applies (QP) structure to (PWG) model using pitch-dependent dilated convolution networks (PDCNNs). PWG is small-footprint GAN-based raw whose generation time much faster than real because of its compact and non-autoregressive (non-AR) non-causal mechanisms. Although achieves high-fidelity speech generation, the generic simple network architecture lacks pitch controllability for an unseen auxiliary fundamental frequency ($F_{0}$) feature such as scaled $F_{0}$. To improve modeling capability, apply QP with PDCNNs PWG, introduces information by dynamically changing corresponding $F_{0}$ feature. Both objective subjective experimental results show that QPPWG outperforms when scaled. Moreover, analyses intermediate outputs also better tractability interpretability QPPWG, respectively models spectral excitation-like signals cascaded fixed adaptive blocks structure.
منابع مشابه
A Nonlinear Autoregressive Model with Exogenous Variables Neural Network for Stock Market Timing: The Candlestick Technical Analysis
In this paper, the nonlinear autoregressive model with exogenous variables as a new neural network is used for timing of the stock markets on the basis of the technical analysis of Japanese Candlestick. In this model, the “nonlinear autoregressive model with exogenous variables” is an analyzer. For a more reliable comparison, here (like the literature) two approaches of Raw-based and Signal-ba...
متن کاملNon-melanoma skin cancer diagnosis with a convolutional neural network
Background: The most common types of non-melanoma skin cancer are basal cell carcinoma (BCC), and squamous cell carcinoma (SCC). AKIEC -Actinic keratoses (Solar keratoses) and intraepithelial carcinoma (Bowen’s disease)- are common non-invasive precursors of SCC, which may progress to invasive SCC, if left untreated. Due to the importance of early detection in cancer treatment, this study aimed...
متن کاملWaveNet: A Generative Model for Raw Audio
This paper introduces WaveNet, a deep neural network for generating raw audio waveforms. The model is fully probabilistic and autoregressive, with the predictive distribution for each audio sample conditioned on all previous ones; nonetheless we show that it can be efficiently trained on data with tens of thousands of samples per second of audio. When applied to text-to-speech, it yields state-...
متن کاملNon radial model of dynamic DEA with the parallel network structure
In this article, Non radial method of dynamic DEA with the parallel network structure is presented and is used for calculation of relative efficiency measures when inputs and outputs do not change equally. In this model, DMU divisions under evaluation have been put together in parallel. But its dynamic structure is assumed in series. Since in real applications there are undesirable inputs an...
متن کاملa new type-ii fuzzy logic based controller for non-linear dynamical systems with application to 3-psp parallel robot
abstract type-ii fuzzy logic has shown its superiority over traditional fuzzy logic when dealing with uncertainty. type-ii fuzzy logic controllers are however newer and more promising approaches that have been recently applied to various fields due to their significant contribution especially when the noise (as an important instance of uncertainty) emerges. during the design of type- i fuz...
15 صفحه اولذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: IEEE/ACM transactions on audio, speech, and language processing
سال: 2021
ISSN: ['2329-9304', '2329-9290']
DOI: https://doi.org/10.1109/taslp.2021.3051765